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Abstract 

Let ^ be a positive semidefinite matrix, block partitioned as 




where B and D are square blocks. We prove the following inequalities for the Schat- 
ten g-norm which are sharp when the blocks are of size at least 2x2: 



|^<(2''-2)||C||^ + ||i?||^ + ||Z)||^, l<q<2, 
and 

||^||^>(2'^-2)||C||^+||i?||^ + ||Z)||^, 2<q. 

These bounds can be extended to symmetric partitionings into larger numbers of 
blocks, at the expense of no longer being sharp: 



and 
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1 Introduction 



In [7] , Bhatia and Kittaneh proved a number of interesting inequalities relating 
the Schatten norms of a block partitioned operator to the Schatten norms of 
its constituent blocks. Let the operator T be written in block-matrix form as 
T — [Tij], with 1 < i, j < d, then it is proven that, for example, 

d'-mi<i:mj\\i<\\T\\i, 2<q a) 

and 

rf'~'imi^>Eii^^.-ii^>imi^' i<?<2. (2) 

It is also shown there that these inequalities are sharp. 

In the following a bound will be called sharp when it can be saturated for 
any allowed choice of the constituent quantities of the bound. For example, in 
(1), these quantities are the norms of the blocks ||Tjj||^, and sharpness means 
here that for any set of non-negative scalars tij an operator T exists such 
that = tij and ||T||^ = J2ijtij- Phrased differently, a sharp bound is 

the best possible bound exploiting a priori specified knowledge. This notion 
of sharpness is stronger than the one used in [7]. Nevertheless, the second 
inequality in both (1) and (2) is evidently sharp according to our definition as 
well, as can be seen by taking a T with blocks = [tij] ® 0. 

Inequalities like (1) and (2) are sometimes called norm compression inequali- 
ties, because the full information contained in the operator is compressed into 
a smaller set of quantities, the norms of its blocks, and the inequalities give 
useful bounds on the norm of the full operator when only its compression is 
known. 

In the present work we restrict attention to positive semidefinite (PSD) ma- 
trices. Under this extra restriction bounds (1) and (2) are no longer sharp. 
Indeed, by just considering the case q = 1, which for positive matrices yields 
nothing but the trace, we know that ||T||i = J2i \ \Tii\\i, and the off-diagonal 
blocks should not contribute at all. 

Known bounds of this form for PSD matrices and operators can be found in 
[6,9] and [11]. The best-known norm compression inequality (although it does 
not directly appear as such) is probably the pinching inequality [6] , which holds 
for any weakly unitarily invariant norm, and arbitrary self-adjoint operators: 
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for any block-partitioned self-adjoint operator A — [Aij] > 0, 

infill > |||©tl^n|||. (3) 

For Schatten norms, this reduces to 

Il^ll.> (^Ell^dl^J , (4) 

which is indeed a norm compression inequality. In ([9], p. 217 Problem 22) one 
can find a complementary inequality for PSD 2x2 block matrices, also valid 
for any unitarily invariant norm, and readily extendible to PSD d x d block 
matrices: 

infill <ElPnlll- (5) 

i=l 



Here |||^m||| is actually a shorthand for |||v4jj © 0|||. That is, the blocks have 
been implicitly filled out with zeroes to the same size as A. There is a very 
simple proof of this inequality that also extends to operators: 

Proof. Consider the d = 2 case only. The general case follows by repartitioning 
the blocks iteratively. Fixing the diagonal blocks An and A22 fixes the RHS 
of (5), and restricts A to a convex set whose extremal points are of the form 

("A 

aa*, with a = and a^a* — A^. Here ai and 02 are blocks with an equal 

number of columns. Because a norm, just as any convex function, reaches its 
maximum over a convex set in an extremal point of that set, we only need to 
check (5) for the extremal A = aa*. Using the triangle inequality for norms, 
and the fact that aa* is unitarily equivalent with a*a © 0, we indeed get: 



\A\ 



\aa 



<E 

i=l 



\a a\\ 
2 



E 

i=l 



1=1 



□ 



Bounds (4) and (5) are sharp when the g-norms of the diagonal blocks only are 
known. They are no longer sharp when the g-norms of all blocks are known, 
as can be seen by considering the Frobenius norm (Schatten 2-norm). Indeed, 



3 



for that norm all blocks contribute evenly, while (4) and (5) only take the 
diagonal blocks into account. 



What we are looking for in this paper are sharp norm compression inequalities 
for the Schatten norms of PSD block matrices, when the norms of all the blocks 

arc known, and not just the diagonal blocks. Bounds of this kind have been 
discovered and proven by King [11] for PSD 2x2 block matrices: 




> 



\A 



iilk 



Vll^2l||5 



1^ 

I A 



12\\q 



22\\q 



, 1<?<2, 



(6) 



and 




< 



ni\\q 



n2\\q 



1^21 1 |g 1 1^2211? 



, 2<q. 



(7) 



That these bounds are sharp is easily seen by considering blocks A^j of the 
] © 0, where Uij are non-negative scalars such that ai2 = 021 



form Aij — 



and 011022 > 012- In fact, when the A^j are scalars, equality holds in (6) and 
(7) throughout. 



The obvious generalisation of (6) and (7) to higher numbers of blocks docs not 
hold for arbitrary q, although King has shown that \\A\\q < \ \ 
holds for integer q and any partitioning [12] . For non-integer q there are already 
counterexamples when the blocks Aij are scalars, in which case the norm- 
compression is just the elemcntwise absolute value, which we denote here by 
|y4|. For example, for the matrix 






-2 


-2 


2 


2 


-1 


2 


3 





-1 





2 



one finds || A||i5 = 7.7617 and || |A| II15 = 7.9761. We have not been able to 
find counterexamples for 3 x 3 partitionings, so it might be that (6) and (7) 
still hold in that case. 

The underlying reason for the failure of (6) and (7) in the general case seems 
to be that a norm compression maps a matrix to an elementwise non-negative 
matrix. The natural ordering for those matrices is the elementwise ordering 
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rather than the PSD ordering. Likewise, unitarily invariant norms, which in- 
volve the eigenvalues of the matrix, do not seem to be the most natural choice 
for norm compressions. That King's bounds can be formulated for 2x2 (and 
may be 3 x 3) partitionings using unitarily invariant norms is most likely a 
coincidence. 

The main result of the present paper is a set of sharp bounds that is comple- 
mentary to (6) and (7). That is, for 1 < g < 2 we find an upper bound, and 
for 5 > 2 a lower bound on the g-norm of a 2 x 2 partitioned PSD matrix, 
given the g-norms of its blocks. These bounds are presented in Section 3. In 
contrast to the bounds (6) and (7), our bounds can easily be generalised to 
any symmetric partitioning, albeit at the expense of loss of sharpness. 

Norm compression inequalities feature in proofs of the multiplicativity prop- 
erty of the 1 — > g norm of certain classes of completely positive maps. Letting 
$ be a completely positive (CP) map, this norm is defined as [1] 

m\i^, = m^x mX)\\„ (8) 

||A||l=l 



where X is Hermitian. Multiplicativity of this norm w.r.t. the tensor product 
is the statement that, for two CP maps $i and $2 [1)2]: 

||$l®$2||l-*g= ll*2||l-.9- (9) 



This basically says that the maximum in (8) for $ = $i (g) $2 is achieved for 
X = Xi ^ X2, where Xj achieves the maximum in (8) for $j. Multiplicativity 
(9) has been shown for various special classes of CP maps within various ranges 
of q. Unfortunately, there exists a class of channels for which (9) does not hold 
when q > 4.79 [17]. Despite this counterexample to the general statement, (9) 
might still be true for any tensor product of CP maps for values of q close to 1. 
If this were true, one could prove additivity of an entropic counterpart of (9), 
and with it a host of other additivity results concerning CP maps. That would 
solve a number of long-standing open problems in quantum information theory 
[5,15]. We intend to investigate the usefulness of our results in that setting in 
future work. 



2 Preliminciries 



The Schatten g-norms, for 1 < g < 00, are the non-commutative generalisation 
of the Iq norms. For a general matrix or operator A, 

11^11. = (Tr(|Ar))V«, 
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which reduces for positive semidefinite matrices A to 

\\A\\, = {Tt{A^))'/'^. 



We will use the positive semidefinite ordering on Hermitian matrices through- 
out, denoted A > B, which means that A — B > 0. This ordering is preserved 
under arbitrary conjugations: A > B implies XAX* > XBX* for arbitrary 
X. 



It is well-known that a 2 x 2 block-matrix A 



^ B C 



yC* D ^ 

B and D is positive semidefinite if and only if > CD~^C*. 



with positive definite 



B C 

The set S of Hermitian C such that | | is PSD, has a unique maximum, 

C D 

called the geometric mean of B and D [13,14]. For any ^4, > 0, the geometric 
mean of A and B, denoted A ^ B, is given by 

A^B^B^A^ ^V2(^-l/25^-l/2^1/2^1/2_ (iQ) 



For A,B > 0, the geometric mean is defined by 

AH^ B = \im{A + el) # (5 + eU). 

ej.0 

For A and B commuting, (10) reduces to A ^ B = [ABY^"^. 
As basic properties, we need [3,4]: 

• C{A # B)C* = (CAC*) # (CBC*); 
. {A^B)-' = A-^^B-'; 

• (^4, B) i-^ A ^ B is jointly monotone in its arguments. That is: if Ai < A2 
and -Bi < ^2, then also Ai # -Bi < ^2 # -B2. 

We will also need the following Lemma: 

Lemma 1 For A,B>Q, the unique positive definite solution of the equation 
XA-^X = B is given by X = A ^ B. 

Proof. From XA~^X = B it follows that X is in the set S of Hermitian 

(ac\ 

matrices C for which > 0, hence X < A ^ B. It also follows that 

[cb) 

X-^AX~^ = B-\ hence < A'^ # B'^ ^ {A # B)-^. Thus, if we restrict 
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to positive definite X, we find X > A ^ B. Therefore, we actually have 
equality: X ^ A ^ B. □ 

A generalisation of the geometric mean is the a-power mean, for < a < 1 
and A,B > 0: 

A4^B^ A^'\A-^'^BA-^/^YA^'\ 

A matrix function / is operator monotone iff it preserves the PSD ordering, 
i.e. A>B implies f{A) > f{B). li A > B implies f{A) < f{B), we say / 
is inversely operator monotone. A matrix function / is operator convex iff for 
all < A < 1 and for all A, S > 0, 

f^XA + (1 - X)B) < Xf{A) + (1 - X)f{B). 

If — / is operator convex, we say / is operator concave. 

The primary matrix function x i— > is operator convex for 1 < p < 2, 
operator monotone and operator concave for < p < 1, and inversely operator 
monotone and operator convex for —1 < p < [6]. 

We will also make use of the log-majorisation relation for positive A, B: 

A -<iog B^logA^ logB, 

which implies weak majorisation A B, and hence < |||-B||| for any 

unitarily invariant norm. 

Finally, we will use the 5^ metric on the positive cone, defined as 

S^{A,B) = \\logEig{AB-')\U 

for A,B>0. Here, Eig(A) is the vector of eigenvalues of A, and the norm 
used is the /qo vector norm. This metric is well-defined since, for A,B > 0, 
AB~^ has positive eigenvalues. We note that 

5^ ( A, S) = max( I log Xi{AB-%\\ogXl{AB-')\), 

where A| and AJ denote the largest and smallest eigenvalue, respectively. 



3 Main Result 



Theorem 1 Let A be a positive semidefinite block matrix 

A = 
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where B and D are square blocks. Then we have the following bound on the 
Schatten q-norm of A for 1 < q <2: 

||^||^<(2''-2)||C||^+||i?||^+||D||^. (11) 



It is easy to see that, for q = 1 and for q = 2, equality holds. Indeed, for 
q = 1, (11) reduces to Tr(A) = Tr(5) + Tr(C), and for g = 2, TriA^) = 
Tt{B^) + 2Tr(|Cp) + Tr(£)^). In this sense, (11) interpolates between these 
two extremal cases. 

Using a standard duality argument, we find that for q >2, inequality (11) is 
reversed: 

CorollEiry 1 For q>2, and with A, B , C , D as in Theorem 1, 

\\A\\l>{2^-2)\\C\\l-r\\B\\l-r\\D\\l. (12) 





B C 

Proof. Consider the matrix ^ = | | from Theorem 1. We will restrict 

C* D 

attention to the case where B and D are of equal size, so that C is square. 
Evidently, the blocks can always be filled out with zeroes to bring them to 
this form without changing the validity of the bound. Furthermore, we restrict 
to C = C* that are positive semidefinite. To see that this incurs no loss of 
generality either, consider the polar decomposition of general C, C — UC, 
where t/ is a unitary and C" > 0. Then 



A' :-- 



with B' = U*BU . Clearly, A and A' have the same norm, and so do B and 
B' , and C and C . Therefore, in the following, we can take C > 0, so that all 
occurrences of ||.||^ can be written as Tr(.)*. 

Let q > 2 and let p be the conjugate power of g: 1/p + 1/q = 1. Holder's 
inequality for positive semidefinite A and B reads Tr[Ai?] < \\A\\p with 
equality ii B — Ap~^. This allows one to express the norm \ \A\\p as the supre- 
mum of Tr[^S] over all S > for which \ \B\\q — 1. In other words, for every 
A> there exists an optimal B > with | |q = 1 such that \ \A\\p = Tt[AB], 
and for all other B > with = 1 one has \\A\\p > Tt[AB]. As the op- 

timal B is given by A^~^/ \ \ Ap~^ \ \^, one can always safely assume that the 
optimal B has the same direct sum structure as A has. 
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Now consider the expression 

\\B(BD(B{2P -2Y/PC\\p. 



(13) 



Let P, Q and R be positive semidefinite matrices such that P©i?©(2'^ — 2)^/^C 
is optimal for the norm in (13) in the abovementioned sense. That is: 



|p®i?e(2«-2)^/«c||g = i, 



and 



|S®L>® (2f-2)^/fC||p 



Tr 



(S ® L> ® (2f - 2)i/fC) (P ® ® (2« - 2)i/«C) 



= Tr[PP + DR+ {2P - 2)^/^(2'? - 2)^/''Cg]. 

Now notice that for all q, {2p - 2)^/*'(2« - 2)^/« < 2, with equality in q 
Thus 



B ® D e (2P - 2y/PC <Tr[BP + DR + 2CQ] 

p 



= Tr 



B C 
C D 



P Q 
Q R 



On the other hand, from P © P © (2*^ - 2)^/''C = 1 and Theorem 1, it fol- 

1 



lows that 



P Q 
Q R 



< 1. Thus, using Holder's inequality, we may conclude 



which proves the inequality (12) 





(pq\ 






^ B c\ 




< 








^qr) 






,cd) 



that Tr 
of the Corollary. □ 



We can combine (11) with (2), applied to the C block, to generahse our bounds 
to general dxd partitionings, by repartitioning the B and C blocks recursively. 



Corollciry 2 For any PSD matrix A, partitioned into d x d blocks Aij such 
that the diagonal blocks are square, 

ll^ll^<Ell^^^ll9 + (2'-2)Ell^^.ll9' 1<?<2 (14) 

i i<j 
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and 



\a\\i>j: 



\Au\\l + {2'^-2) 



2:\\A^M, 2<q. 

i<j 



(15) 





c = 






^ d o\ 













The proof of (12) extends without essential changes to (15). 

Concerning sharpness, we first have to mention that for blocks of size 1x1, 
our bounds are not sharp, quite simply because King's bounds (6) and (7) are 
equalities in that case. For blocks of size 2x2 (and larger), our bounds (11) 
and (12) are sharp, as witnessed by blocks of the form 



B 



where 6, c and d are non-negative numbers. In Section 4, however, we show that 
(14) is not sharp. It would be interesting to find better bounds for that case, 
but at this point it is not clear to us whether this question has a reasonable 
answer. 

To prove the central technical result (11), we can, just as in the proof of 
Corollary 1, w.l.o.g. restrict attention to the case where block C is square and 
positive semidefinite. Inequahty (11) can then be reformulated in a way that 
sheds light on the somewhat curious factor of 2' — 2. Note, namely, that 



C C 

TrI I =2''TrC9, 

C C 



and 



Hence, (11) can be written as 
/ 





< Tr 



= 2TrC*. 





(16) 



It is clear that both sides are non-negative, since 




is a pinching of 



B C 
C D 



, and weakly unitarily invariant norms, such as the Schatten norms. 
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are non-increasing under pinchings. The difference expressed by the left-hand 
side is thus the amount of norm decrease caused by this particular pinching, 
and the inequality says that, when fixing C and constraining B and D to keep 
A PSD, this norm decrease is maximal when B — D — C. 



The Proof of Theorem 1 will be given in Sections 5 to 8. 



4 Bound (14) is not shctrp 



In this Section, we consider the generalisation (14) of our bound to general 
partitionings, and show that it is no longer sharp. We consider a particular 
class of PSD matrices {Aij)ij for which every block has the same g-norm: 
IIAjllq — 0" We first show that this imphes that all blocks have the same 
absolute value. 

Consider the blocks An, Ajj and Aij for some i < j. Non-negativity of A 
imphes that An > AijA'J^A*j. Since all blocks have the same norm, we actually 
must have equality. 



Lemma 2 For a PSD block matrix A = \ > 0, the equality \ \B\ 



C* D 
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\C\\g = \ \D\\g implies B = CD-^C* 



Proof. Suppose there was a A > for which B = CD-'^C* + A. By [6], (1V.53), 
|||(/1 + B)©0||| > |||yl©B||| for A,B>0, hence Tr{A + By > TiA^ + TrBi. 
For finite q this means that \\A + B\\q is strictly larger than \\A\\q when B 
is non-zero. Specifically, if A is non-zero, we find ||-B||^ > ||C£)~^C*||q. Using 
a Theorem of Horn and Mathias [10], \\CD-^C*\\g > | |C| |2| hence the 
non-vanishing of A implies ||CD~^C*||g > | jC*! |^| |D| |~^, which violates the 
statement that ||-B||g = \ \C\\q = \ \D\\g = a. Therefore, A must be zero. □ 

Using King's inequality (6), we can strenghten this further. 
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, B C ^ 

Lemma 3 For a PSD block matrix A = \ | > 0, the equality \ \B\ 

C* D 

\\C\\g — \\D\\g — a, 1 < q < 2, implies B — D — UC , where U is a unitary 
commuting with D. Thus, in some basis, B, C and D are diagonal, and B — 
\C\ - D. 

Proof. Prom the previous Lemma, we already know that B — CD~^C*. Using 
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(6), we find 



^ CD-^C* 




> 


^ a a\ 






1 


a J 



The left-hand side is equal to \\D + D ^/'^C*CD By the triangle in- 

equality, 



\\D + D-^''^C*CD-^/'^\\q < I IZ^I I, + \ \D-^l'^C*CD-^'^\\q 

^\\D\\q^\\CD-^C*\\q^\\D\\q+\\B\\q^2a. 

Combining these two inequalities, we find that equality holds. Now, by the 
Lemma below, this implies D — tB, with, in particular, t — 1, thus D — B. 
This further implies CD-^C* = D and also D-'^/^C*CD-^/^ = D. From the 
latter equation we find \C\ = D. The polar decomposition of C must therefore 
he C = UD. Inserting this in the former equation yields UDU* = D, so that 
U must commute with D. □ 

Lemma 4 For given matrices A, B, equality in the Triangle Inequality for 
q-Schatten norms with 1 < q <2, 

\\A + B\\q = \\A\\q + \\B\\q, 

implies A — tB, for some t > 0. 

Proof. By convexity of norms, for all A G [0, 1], 

\\XA+{1 - X)B\\g < X\\A\\g + (1 - A)||S||,. 

Then HA -|- = \\A\\q + \ \B\\q implies equality for all A, and by dividing 
both sides by A, we get 

\\A + tB\\g < \\A\\g + t\\B\\g, 

where t = (1 — A)/A > 0. Choosing t equal to | |q/| |q and setting B' — tB, 
we get, in particular, \ \A\\q —: a, ||-B'||g = a, and ||A-|-S'||g = 2a. Inserting this 
in the "hard" Clarkson-McCarthy inequahty [16], which is vahd for 1 < g < 2: 

\\A + +\\A- B'Wl < 2{\\A\l + \\B'\\l)P/i, 

with l/p+ 1/q — 1, gives, for > 1 (i.e. finite p) 

\\A-B'\\Pg< (2 2f/«-2f)af = 0, 

whence it follows that A — tB. □ 
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So, we now can already conclude that A — (^ij)y must in a certain basis be 
of the form 

A = ^XjXj, 
j 

with Xj > such that J^j a;] = a^, and Xj dxd PSD matrices whose elements 
all have modulus 1. Now, this can only be if the Xj are rank 1, as can be seen 
by noting that Tr(X^/rf)2 = Tr(Xj/rf). Thus, \\A\\l = Y.jX^d" = a^'d'^. On the 
other hand, (14) gives ||A||^< (2" - 2)(rf(rf - l)/2)a« + rfa«. As this is strictly 
larger than a^d^ for 1 < g < 2, this shows that (14) is not sharp. 



5 Proof of Theorem 1 



We only have to prove (16) for 1 < q < 2. The cases q = 1 and q — 2 are 
trivial, as noted before. Furthermore, we only have to deal with the case where 
all blocks are square and of the same size. We can easily generalise our Main 
Theorem to non-square C blocks, by filhng out the smaller blocks with zeroes 
to the required size. 

We deal first with the case that B and D are bounded and positive definite, 
and leave the remaining cases for last (cfr. Proposition 2). 

Let us consider the left-hand side of (16) and effectively calculate its maximum 
value. We start by maximising it over B. The constraint on B, originating from 
the requirement A> 0, is B > CD^^C . We will now show that the maximum 
over B is obtained m. B = Bq := CD^^C. Let us thereto put B = Bq + tA, 
with A > 0, and define 



fit) := Tr 



Bo + tAC 
C D 



Tr 



' Bo + tA ^ 
D 



The derivative of / is given by 

,Y 

f{t) = qTr' 





q-1 




Introducing the projector P = 1 ® 0, we can write 

B C 



f'{t)=qTT 



^ B C 



p 



C D 



C D 
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For 1 < q < 2, the function x i— >■ g{x) = x 
and ^(0) = 0. Therefore ([6], Theorem V.2.3) 



is operator concave on [0, +oo), 




q-l 



P < 




y-1 



This shows that f'{t) < and that f{t) is indeed maximal in 0. Therefore, we 
can henceforth put B — CD~^C . 



Define f{D) as 
f{D) := Tr 



/ CD-^C c\ 


g 




^CD-^C 




Tr 




-Tr 






V ^ 











(17) 



Since 



^ CD-^C c\ I CD-^/"^ 



C D 



(18) 



and CD has the same spectrum as D ^/"^C^D we can rewrite f{D) 



as 



f{D) = Tr(G' + D)« - Tr - Tr L>«, 



(19) 



where we have introduced 



(20) 



A short (numerical) calculation reveals that /(-D) is neither convex nor con- 
cave, not even in the scalar case (C and D scalars). 

To perform the maximisation of f{D) over all possible D > 0, we calculate 
the gradient and stationary points of f{D). We replace D hy D + tX, with 
Hermitian X, and calculate the Prechet derivative of (17): 



d_ 
di 



J{D + iX) = g Tr [X D-^l'^{p{{D + G')^-^ - D«-2)D 

-G{{D + ay-'' - g'''^)g)d-^/^ 
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In this calculation we have used the approximation 



(D + tX)-' = L>-V2(l + iL»-V2;^^-i/2)-i^-i/2 
= D-^ - tD-^XD-^ + 0{f), 



the expression for the Frechet derivative of the power function 

d_ 
di 



Tr(A + iA)« = 5Tr(A«-^A), 



and the equality 

CD-^C C 
C D 



£)l/2 



^^-1/2^2^-1/2 ^ ^-1/2^ ^1/2 ^ 



for all p, which follows from (18). Therefore, the gradient of f{D) is given by 
the expression 



Vf(D) = qD-^/''[D((D + Cy-^ - Di-^)D 

-G{{D + G)^-' - G'«-2)G']L>-^/^ (21) 

and D is a stationary point of /(-D) if and only if this gradient is zero. This 
clearly shows that the gradient of / is well-defined and continuous in the inte- 
rior of the positive semidefinite cone S. It is also clear that D — C, implying 
that also G = C, is a stationary point. 

The global maximum of / must either be a stationary point, a singular point, 
or a boundary point. As the gradient of / is well-defined in the interior of S, 
f has no singular points. In the following Sections we prove that D = G = C 
is the only stationary point of /. More precisely, in Sections 6 and 7 we will 
prove the following Proposition: 

Proposition 1 For p in the range —l<p<l,py^O, and for D > 0, the 

equation in G 

D{{D + GY - DP)D - G{{D + Gf - GP)G = 

has one solution over the positive definite matrices, namely G — D. 

Since we are dealing with values 1 < g < 2, this Proposition applies with 
p^q-2. 

Finally, we show in Section 8 that the values of / on the boundary of S are 
not greater than /(C). This is proven in an inductive way, as follows: 
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Proposition 2 Assuming f{D) < f{C) holds for all C and D of size d' x d' , 
f{D) < f{C) also holds for d x d matrices D that are bounded and invertible 
on a d' -dimensional subspace of the full d-dimensional space (with d' < d). 

Using induction on the size d of the blocks, these two Propositions allow us 

to conclude that D = C, the "only stationary point in town", is the global 
maximum of f{D), so that f{D) < f{C) for all D > 0, which is what we 
needed to show. This finishes the proof of Theorem 1. 



6 Uniqueness of the stationciry point 

In this and the following Section, we present the proof of Proposition 1. We 
consider the equation 

D{{D + Gf - DP)D = G{{D + Of - 0^)0 (22) 

over G > 0, and we will show that G — D, implying G — D — C, '\s its only 
solution for values ofp, —1 <p < 1, p^Q. 

We start with the case < p < 1. Applying Lemma 1, (22) is equivalent with 

G = {D{{D + Gf - DP)D) # {{D + Gf - G^)'^ , 

and wc define the map that maps G to the matrix expressed by the right- 
hand side of this equation: 

G ^ ^d{G) = {D{{D + Gf - DP)D) # {{D + Gf - G^)'^ . (23) 

For the case -I < p < {D + GY - and {D + GY - G^ are negative, and 
we now find 

G = {D{DP -{D + GY)D) # {G^ - {D + GfY^ . 

The sign changes, as compared to (23), are necessary for the geometric mean 
to have positive definite arguments. Therefore, in that case, we define as 

G ^ ^d{G) = {D{Df -{D + GY)D) # (G^ - {D + Gfy^ . (24) 

To prove that (22) has only one solution, we will show that has only one 
fixed point (namely G = D) for — 1 < p < 1, p 7^ 0. The way we will do this 
is by showing that is "contractive w.r.t. the fixed point D" . Endowing the 
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cone of positive semidefinite matrices S with the metric 5oo, contractivity of 
w.r.t. D means the inequahty 



where the "Lipschitz constant" /3 is strictly less than 1. This statement resem- 
bles the definition of contractivity of a map, which says that, for all G and 
G', 5oo($(G'),$(G")) < f3SooiG,G'), with Lipschitz constant /5 < 1. By the 
contraction mapping principle, contractive maps have a unique fixed point in 
S. Similarly, the weaker statement (25) is already enough to show that D is 
the unique fixed point oi ^d- Indeed, suppose there is another fixed point D': 
^d{D') = D'. Taking G ^ D' in (25) then yields 5^{D',D) < PS^{D',D), 
which can only be true if 5oo{D' , D) — 0, i.e. D' — D. 



7 Contractivity of the map $d 

We will now prove that when — 1 < p < 1, (25) holds with j3 = p/{2^^^ — 2), 
which is strictly less than 1 for —1 < p. If the map $d would have been 
operator monotone, this would have allowed us to straightforwardly reduce 
the problem to the scalar case. However, the subexpression {{D + Gy — G'^)~^ 
is not monotone in G. Nevertheless, monotonicity holds in the following very 
restricted sense, and this will turn out to be just enough for our purposes. 

Lemma 5 Let A, B he positive semidefinite and k a positive scalar. 

ForO<p< 1: 

A<kB implies {A + Bf - A^ > {kB + Bf - {kBf > 0. 

For — 1 < p < 0, the orderings are reversed: 

A<kB implies {A + Bf - A^ < {kB + Bf - {kBf < 0. 

As a side remark, we note that, for instance for < p < 1, A > kB does not 
imply {A + B)p - Ap < {kB + B)p - {kB)P. 

Proof. We note first that A + B can be written as the convex combination 
A(A; + l)B + (1 - A)((A; + l)/k)A, with A = l/{k + 1). 

By operator concavity of the function a;i-^a;^,0<p<l, we then have 



5^{^d{G),D)<(55oo{G,D), 



(25) 



{A + B)P >\{k + l)PBP +{1 



X){{k + l)/k)PAP, 



so that 



( 



k 



) 



i-p 



{kB + B)P - (A + B)P < 



k + l 



{{kB)P-AP). 
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Since x i— > x^, < p < 1, is also operator monotone, {kBY — A^ > 0. For p < 1 
and k>0, the factor {k/{k + 1))^-^ is < 1, so that {kB + B)p - {A + B f < 
{kSy — AP follows, which is equivalent to the first inequality of the Lemma. 

For the second case, — 1 < p < 0, we proceed in exactly the same way, but 
now exploiting the operator convexity and inverse monotonicity oi x ^-^ x^ for 
-l<p<0. □ 

Using Lemma 5, we can easily prove similar statements for ^d{G). Define the 
function 



It is readily seen that 0(l/x) = l/(f){x). 

Lemma 6 Consider matrices D,G > 0, and a scalar k > 0. For —1 < p < 1, 



Proof. We start with the case < p < 1, for which the function x i— > is 
operator monotone (and concave). Then G < kD implies 




(26) 



G < kD implies ^/^(G) < (^{k)D, 
D<kG implies ^d{G) > (f){k)-^D. 



(27) 
(28) 



D{{D + Gf - DP)D < D{{D + kDf - 0^)0 



By Lemma 5, we also have 



{{D + Gf - G^)-^ < {{D + kDf - {kDYY 
= ((1 + kY - kP)-'D-P. 



-1 



Joint monotonicity of the geometric mean then yields 



^d{G) < ((1 + kY - l)D'i+^ # ((1 + kY - kP)-^D-P 



which is (27). 



To prove (28), D < kG similarly imphes 



{{D + GY - GP)-' > ((1 + kY - l)-'G-P. 
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Using Lemma 5 again, we have 



D{{D + GY - DP)D > D{{kG + Of - {kGY)D 

= ((1 + kf - kP)DGPD. 



For the geometric mean we get 



^d{G) > ((1 + ky - 1)-^G-P # ((1 + ky - kP)DGW 
= ^{k)-\G-P # DGPD) 

^(Piky^D. 



In the last line we have used 



Q-p # DG^D = D'/\D-'/^G-PD-^/^ # D^/^GpD^/^)D^/^ 



For — 1 < p < 0, incquahties (27) and (28) arc proven in exactly the same 
way. On one hand, since x ^ is now inversely operator monotone, the 
inequality signs are reversed, and the same applies for the inequality of Lemma 
5. However, this reversal is counteracted by the fact that in this regime $£)(G) 
is defined by (24), which has additional sign changes, hence the inequalities 
of the Lemma still remain valid. □ 

From this Lemma we get inequalities for \{ and Aj of GD~^ and ^DiG)D~^, 
vahd for —1 < p < 1. Assume first that \i{GD~^) = K. This amounts to 
G < KD, and by the first statement of Lemma 6, implies ^d{G) < (j){K)D, 
hence Xi{^D{G)D-^) < (p{K). Thus we get 



Then assume Al(GD-i) = k, which means that G > kD, and by the second 
statement of Lemma 6, ^d{G) > {l/(j){l/k))D — (f){k)D. Thus, similarly. 



To combine (29) and (30) into an expression relating the metric distance 
S(x,{^d{G), D) to d(X){G,D), we introduce the function 



Xi{^D{G)D-') < <l>{Xi{GD-')). 



(29) 



X[{<^>d{G)D-')><P{X[{GD-')). 



(30) 



h{x) = log 0(exp(a;)). 
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Prom (j){l/x) = l/0(x), we see that h is odd, h{—x) = —h{x). Moreover, h 
is monotonously increasing. Finally, we note that for — 1 < p < 1, h{x)/x 
achieves its maximum in x = 0, and 



lim 



h{x) 



p 



(31) 



X 



- 2 



Taking the logarithm of (29) and (30) gives 



y^ ■.= \ogXi{<^D{G)D-')<h{log\i{GD~')) 
y2 := logXl{<^D{G)D-')>h{logXl{GD-')) 



h{xi), 
h{x2), 



where we also introduced some shorthand. These two inequalities can be com- 
bined as h{x2) < 1/2 < < h{xi), showing that the interval [y2,yi] is com- 
pletely contained in [h{x2), h{xi)]. Therefore, 



Since h is odd, \h{x) \ = h{\x\), and because h is monotonously increasing, 

max(|yi|, jyal) < max(/i(|xi|), /i(|a:2|)) = /i(max(|xi|, \x2\)). 

Now the left-hand side is nothing but 6oo{^d{G), D), and the right-hand side 
is h{S^{G,D)). By (31) it finally follows that 



which proves that D = C is the only stationary point of f{D). This finishes 
the proof of Proposition 1. 



8 Value of f{D) for non-invertible and/or unbounded D 

In this Section we study the behaviour of f{D) for D on the boundary of the 
PSD cone, that is, for non-invertible and/or unbounded D. This will result 
in a proof of Proposition 2. As mentioned above, this Proposition is used to 
inductively prove the statement f{D) < f{C), and relies on the induction 
hypothesis that f{D) < f{C) holds for matrices of lesser dimension. 

We consider blocks C and D of size d x d. Let P be a projector on a d'- 
dimensional subspace of the full d-dimensional space, and let = 1 — P be 
the projector on the complementary subspace. 

We consider D of the form D = D'+eP, where D' is bounded and invertible on 
the complementary subspace (P"*") and elsewhere. We study non-invertible 



max(|?/i|, \y2\) < max{\h{xi)\, \h{x2)\). 



S^{<^>D{G),D)<l3pd^{G,D), 
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D by taking P the projector on the kernel of D and letting e tend to zero. 
Likewise, we study unbounded D by taking P-*- the projector on the subspace 
on which D is bounded and letting e tend to infinity. 

Thus D-^ = D'-^ © P/e. Denote Q := P^C^P^, R := PC^P and G' = 
P>'-V2gD'-i/2^ thus G = G" ® P/e. Then 

/(P») = (Tr(G' + D'Y - Tr G'^ - TrD'") 

+ (Tr(P/e + eP)« - Tr(P/e)9 - Tr(eP)«). (32) 

We assume validity of the induction hypothesis on the complementary sub- 
space, namely that 

Tr(G' + D'Y -TrG"i -Tr D"i 

is maximal for D' = G'. Noting that the role of block C in the definition of 
f{D) is taken up here by Q^^^, D' =^ G' corresponds to D' = Q^/^. 

We now show that when q < 2, the second term tends to if e tends to 0. By 
the Lieb-Thirring inequality, and restricting to the subspace of P, 

Tr(P + e'^Py = Tr(P(P + e^P"^))* < Tr(P«(P + e^R'^y). 

Since the non-zero eigenvalues of P+e^R~^ are all > 1, we have (P-|-e^P~^)'^ < 
(P + e^p-^)^ for g < 2, so that also 

Tr(P + e^Py < Tr(P^(P + e^R-^f). 

Hence 

Tr(P/e + ePy - Tr(P/e)« - Tr(eP)« 
= e-^(Tr(P + e'^Py - Tr P« - Tr(e^P)«) 

<e-'?(Tr(P + e2p)''-TrP'?) 
< e-'?(Tr(P^(P + e'^R-y ) - Tr P") 
= e-'?(2e2 Tr P^^^ + Tr P^^^) 
= 2e2-« Tr P""^ + e'^-" Tr R'^-^ 

It is easily seen that for values of g < 2, this tends to if e does. 

The proof that Tr(P/e + ePy - Tr{R/ey - Tr{ePy tends to if e tends to 
infinity is completely similar. 

By the induction hypothesis, the first term in (32) obeys the inequality 
Tr(G' + D'y - Tr - Tr D"i < (2« - 2) TT{Q^/y. 
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Now Q = P-^C^P^ means that in some basis Q is a principal submatrix of 
C^. Hence, by eigenvalue interlacing, and by the non-negativity of C and Q^, 
Trg-^/^ < Tr(C2)9/2 = TrC«, so that 

Tr(G" + D'Y - TrG"« - TrD'« < /(C). 

Combining the two terms proves f{D) < f{C) for non-invertible/unbounded 
D with a (i'-dimensional bounded invertible part, based on the induction hy- 
pothesis f{D) < f{C) for dimension d'. This finishes the proof of Proposition 
2. 



9 Final Remcirk 

The method used to prove that G = D is the unique solution of (22) can be 
employed for other matrix equations. Here we illustrate this for the equation 

AX'^A = XA'^X, A>0 (33) 

and show that X = A is its unique PSD solution when < g < 2. Again we 
can use Lemma 1 to solve the right-hand side for X, giving the equation 

X = (AX'^A) # = A{X'i # A~'^-^)A. 

This defines the map 

X ^ *a(^) := A{X'i # A-i-'^)A. 

We show that 

S^{^a{X),A) < {q/2)5^{X,A). (34) 

To do so, we consider the log-majorisation version ([4], Theorem 3.1) of Fu- 
ruta's inequality [8]. Let denote the a-power mean, then for A,B>0, 
0<q;<1, p>0 and r < min(Q;, ap) 

^{l-a)/2^a^il-a)/2 ^^^^ ^^p-r ^^(l-a)r-/2a^p^(l-a)r/2a ) j _ 

Substituting A by A^, B by X''^, a by 1/2, p by q/2, and r by -1/2 yields 

^V2;^-l^V2 ^j^^(^(l+9)/2^^-l-9/2^j^-l^<?^-l-9/2^1/2^(l+g)/2^1/(g/2) 
= (^(l+9)/2 ^^l+?/2 j^g^l+9/2 ^ -l/2^(l+9)/2^ l/(g/2) _ 
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From this log-majorisation relation follows directly that 



\\\\og{A^'^X-^A^'^)\\\ 

> (l/(g/2))||| log(A(l+^)/2(^l+g/2^g^l+,/2)-l/2^(l+,)/2^ 

for any unitarily invariant norm, hence (34) indeed holds. 
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